50 research outputs found

    Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain

    Full text link
    Word embeddings have made enormous inroads in recent years in a wide variety of text mining applications. In this paper, we explore a word embedding-based architecture for predicting the relevance of a role between two financial entities within the context of natural language sentences. In this extended abstract, we propose a pooled approach that uses a collection of sentences to train word embeddings using the skip-gram word2vec architecture. We use the word embeddings to obtain context vectors that are assigned one or more labels based on manual annotations. We train a machine learning classifier using the labeled context vectors, and use the trained classifier to predict contextual role relevance on test data. Our approach serves as a good minimal-expertise baseline for the task as it is simple and intuitive, uses open-source modules, requires little feature crafting effort and performs well across roles.Comment: DSMM 2017 workshop at ACM SIGMOD conferenc

    Named Entity Resolution in Personal Knowledge Graphs

    Full text link
    Entity Resolution (ER) is the problem of determining when two entities refer to the same underlying entity. The problem has been studied for over 50 years, and most recently, has taken on new importance in an era of large, heterogeneous 'knowledge graphs' published on the Web and used widely in domains as wide ranging as social media, e-commerce and search. This chapter will discuss the specific problem of named ER in the context of personal knowledge graphs (PKGs). We begin with a formal definition of the problem, and the components necessary for doing high-quality and efficient ER. We also discuss some challenges that are expected to arise for Web-scale data. Next, we provide a brief literature review, with a special focus on how existing techniques can potentially apply to PKGs. We conclude the chapter by covering some applications, as well as promising directions for future research.Comment: To appear as a book chapter by the same name in an upcoming (Oct. 2023) book `Personal Knowledge Graphs (PKGs): Methodology, tools and applications' edited by Tiwari et a

    Using Contexts and Constraints for Improved Geotagging of Human Trafficking Webpages

    Full text link
    Extracting geographical tags from webpages is a well-motivated application in many domains. In illicit domains with unusual language models, like human trafficking, extracting geotags with both high precision and recall is a challenging problem. In this paper, we describe a geotag extraction framework in which context, constraints and the openly available Geonames knowledge base work in tandem in an Integer Linear Programming (ILP) model to achieve good performance. In preliminary empirical investigations, the framework improves precision by 28.57% and F-measure by 36.9% on a difficult human trafficking geotagging task compared to a machine learning-based baseline. The method is already being integrated into an existing knowledge base construction system widely used by US law enforcement agencies to combat human trafficking.Comment: 6 pages, GeoRich 2017 workshop at ACM SIGMOD conferenc

    Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes

    Full text link
    Recent work on transformer-based neural networks has led to impressive advances on multiple-choice natural language understanding (NLU) problems, such as Question Answering (QA) and abductive reasoning. Despite these advances, there is limited work still on understanding whether these models respond to perturbed multiple-choice instances in a sufficiently robust manner that would allow them to be trusted in real-world situations. We present four confusion probes, inspired by similar phenomena first identified in the behavioral science community, to test for problems such as prior bias and choice paralysis. Experimentally, we probe a widely used transformer-based multiple-choice NLU system using four established benchmark datasets. Here we show that the model exhibits significant prior bias and to a lesser, but still highly significant degree, choice paralysis, in addition to other problems. Our results suggest that stronger testing protocols and additional benchmarks may be necessary before the language models are used in front-facing systems or decision making with real world consequences

    Understanding Substructures in Commonsense Relations in ConceptNet

    Full text link
    Acquiring commonsense knowledge and reasoning is an important goal in modern NLP research. Despite much progress, there is still a lack of understanding (especially at scale) of the nature of commonsense knowledge itself. A potential source of structured commonsense knowledge that could be used to derive insights is ConceptNet. In particular, ConceptNet contains several coarse-grained relations, including HasContext, FormOf and SymbolOf, which can prove invaluable in understanding broad, but critically important, commonsense notions such as 'context'. In this article, we present a methodology based on unsupervised knowledge graph representation learning and clustering to reveal and study substructures in three heavily used commonsense relations in ConceptNet. Our results show that, despite having an 'official' definition in ConceptNet, many of these commonsense relations exhibit considerable sub-structure. In the future, therefore, such relations could be sub-divided into other relations with more refined definitions. We also supplement our core study with visualizations and qualitative analyses.Comment: arXiv admin note: substantial text overlap with arXiv:2011.1408
    corecore